Multilingual chief complaint classification for syndromic surveillance: An experiment with Chinese chief complaints
نویسندگان
چکیده
PURPOSE Syndromic surveillance is aimed at early detection of disease outbreaks. An important data source for syndromic surveillance is free-text chief complaints (CCs), which may be recorded in different languages. For automated syndromic surveillance, CCs must be classified into predefined syndromic categories to facilitate subsequent data aggregation and analysis. Despite the fact that syndromic surveillance is largely an international effort, existing CC classification systems do not provide adequate support for processing CCs recorded in non-English languages. This paper reports a multilingual CC classification effort, focusing on CCs recorded in Chinese. METHODS We propose a novel Chinese CC classification system leveraging a Chinese-English translation module and an existing English CC classification approach. A set of 470 Chinese key phrases was extracted from about one million Chinese CC records using statistical methods. Based on the extracted key phrases, the system translates Chinese text into English and classifies the translated CCs to syndromic categories using an existing English CC classification system. RESULTS Compared to alternative approaches using a bilingual dictionary and a general-purpose machine translation system, our approach performs significantly better in terms of positive predictive value (PPV or precision), sensitivity (recall), specificity, and F measure (the harmonic mean of PPV and sensitivity), based on a computational experiment using real-world CC records. CONCLUSIONS Our design provides satisfactory performance in classifying Chinese CCs into syndromic categories for public health surveillance. The overall design of our system also points out a potentially fruitful direction for multilingual CC systems that need to handle languages beyond English and Chinese.
منابع مشابه
Assessing the performance of American chief complaint classifiers on Victorian syndromic surveillance data
Syndromic surveillance systems aim to support early detection of salient disease outbreaks, and to shed timely light on the size and spread of pandemic outbreaks. They can also be used more generally to monitor disease trends and provide reassurance that an outbreak has not occurred. One commonly used technique for syndromic surveillance is concerned with classifying Emergency Department data, ...
متن کاملAutomated Syndromic Classifi cation of Chief Complaint Records
yndromic surveillance, a medical surveillance approach that bins data into broadly defi ned syndrome groups, has drawn increasing interest in recent years for the early detection of disease outbreaks for both public health and bioterrorism defense. Emergency department chief complaint records are an attractive data source for syndromic surveillance owing to their timeliness and ready availabili...
متن کاملEvaluation of preprocessing techniques for chief complaint classification
OBJECTIVE To determine whether preprocessing chief complaints before automatically classifying them into syndromic categories improves classification performance. METHODS We preprocessed chief complaints using two preprocessors (CCP and EMT-P) and evaluated whether classification performance increased for a probabilistic classifier (CoCo) or for a keyword-based classifier (modification of the...
متن کاملSyndromic surveillance on the Victorian chief complaint data set using a hybrid statistical and machine learning technique
Emergency Department Chief Complaints have been used to detect the size and the spread of disease outbreaks in the past. Chief complaints are readily available in digital formats and provide a good data source for syndromic surveillance. This paper reports our findings on the identification of the distribution of a few syndromes over time using the Victorian Syndromic Surveillance (SynSurv) dat...
متن کاملOntology-enhanced automatic chief complaint classification for syndromic surveillance
Emergency department free-text chief complaints (CCs) are a major data source for syndromic surveillance. CCs need to be classified into syndromic categories for subsequent automatic analysis. However, the lack of a standard vocabulary and high-quality encodings of CCs hinder effective classification. This paper presents a new ontology-enhanced automatic CC classification approach. Exploiting s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- International journal of medical informatics
دوره 78 5 شماره
صفحات -
تاریخ انتشار 2009